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Abstract 

A method for selecting a graphical model for p-vector¬ 
valued stationary Gaussian time series was recently 
proposed by Matsuda and uses the Kullback-Leibler 
divergence measure to define a test statistic. This 
statistic was used in a backward selection procedure, 
but the algorithm is prohibitively expensive for large 
p. A high degree of sparsity is not assumed. We show 
that reformulation in terms of a multiple hypothesis 
test reduces computation time by 0{p^) and simu¬ 
lations support the assertion that power levels are 
attained at least as good as those achieved by Mat- 
suda’s much slower approach. Moreover, the new 
scheme is readily parallelizable for even greater speed 
gains. 

1 Introduction 

There has been much interest in recent years in 
the construction of graphical models from p-vector- 
valued (or multivariate) stationary time series {W} 
where Xt = ..., G t G Z, and ^ de¬ 

notes transposition. The purpose of graphical models 
is to aid visualization of connections between multi¬ 
ple time series: each of the time series is represented 
by one vertex and it is wished to define connections 
via edges between the vertices of the graph. The lack 
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of an edge indicates the lack of a connection between 
the corresponding series. 

Formally, a graph G = {V, E) consists of vertices 
V and edges E, where E C {(j, k) G V xV : j ^ k}. 
(We are considering simple graphs where there are 
no loops from a vertex to itself, nor multiple edges 
between two vertices.) To represent {Xt\ the ver¬ 
tices of the graph correspond to the p individual se¬ 
ries {Xj^t}, so F = {!,...,p}. Edges connect or¬ 
dered pairs of distinct vertices. Edges (j, k) G E for 
which both (j, k) G E and {k^j) G E are called undi¬ 
rected edges. An undirected graph is one with only 
undirected edges and it only represents interaction 
between the series. An edge (j, k) is called directed if 
(j, k) G E, with (fc, j) ^ E. A directed graph is one in 
which all edges are directed and it typically encodes 
directions of influence or of causation between series. 

In this paper we will consider the modelling only 
of undirected graphs. There are p{p— l)/2 unordered 
pairs of vertices for the graph and possible 

distinct graph structures. A high degree of sparsity 
of edges is not assumed. We are interested in both 
moderate p and large p; both are practically impor¬ 
tant and present a challenge to graphical modelling 
when a high degree of sparsity is not assumed. 

The statistical framework for graphical modelling 
of vector-valued time series was begun by Brillinger 
[2] who considered both directed and undirected 
graphs. Two different nonparametric approaches 
were subsequently developed, by Dahlhaus [5] for 
undirected graphs, and by Bach and Jordan [1] for 
directed graphs. 

In the approach of [3] the absence of an edge in the 
graphical model between series j and k is indicated by 
the corresponding partial coherence, being zero at all 
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frequencies /. The partial coherence is a frequency 
domain version of the partial correlation coefficient 
and measures, at a frequency /, the correlation be¬ 
tween series j and k when all other series involved 
are held constant. The partial coherence is denoted 
7|fc.{\jfc}(/)> where {\jk} = {1 < i < p : i ^ j,k}, 
and the ,{\jk} terminology indicates that these se¬ 
ries are held constant. The assessment of the in¬ 
teraction between series j and k thus discounts the 
indirect effects of the other series. Estimated par¬ 
tial coherencies will include sampling variability and 
will never be exactly zero, so that hypothesis test¬ 
ing is required to test edge {j, k) to see if it should 
be declared to be missing. The problem here is that 
the partial coherence for edge (j, k) must be zero- 
tested for every frequency computed: Dahlhaus [3] 
suggested a test based simply on the maximum of the 
nonparametrically-estimated partial coherence over 
the frequency range, but the exact asymptotic null 
distribution of his test statistic is not known and only 
approximations have been used in practice. Never¬ 
theless, this nonparametric approach has seen con¬ 
siderable use [a [HIST]. 

The approach of [T] for directed graphs, while inap¬ 
plicable here, had as a key component the use of the 
Kullback-Leibler (KL) divergence between stationary 
processes, formulated earlier by m- In this paper 
we use the KL divergence for determining undirected 
graphs. 

As an alternative to [5], it was suggested in 
and [IH] to instead use parametric graphical models, 
known as ‘graphical interaction models’ which utilise 
vector autoregressive (VAR) processes to model 
{Xt}. Here the VAR parameters are constrained by 
an associated graph; by then ranging over all the 
2 p(p-i )/2 possible graphs and (typically low) orders of 
the VAR model, an information criterion (IC) can be 
used to select an appropriate model. However such 
an exhaustive search procedure is only suitable for 
small p. 

As an alternative to such exhaustive searches, a 
topology selection scheme which uses a more efficient 
approach was given in m- It uses penalized maxi¬ 
mum likelihood where the penalty term reflects spar¬ 
sity constraints. For every pair of series, the resulting 
partial coherence is then subjected to thresholding to 


determine whether it can be considered to be every¬ 
where null (for determining the missing edges). Hav¬ 
ing thus determined the missing edges, the graph is 
determined and constrained parameters can be esti¬ 
mated. By ranging over a small number of possi¬ 
ble VAR orders and penalty weights, and computing 
an IC in each case, the graph giving the minimum 
value of the IC is selected. Unfortunately, the cor¬ 
rect/optimum level to take for the critical threshold¬ 
ing step is unknown in practice. 

A fully nonparametric approach to graphical mod¬ 
elling has the advantage of avoiding the possibility 
of model misspecification that can arise with para¬ 
metric modelling when addressing real-world data. 
Indeed, Matsuda [T3] proposed the identification of 
a graphical model for {X*} based on the use of 
nonparametrically-estimated Kullback-Leibler (KL) 
divergence between two graphical models. Matsuda’s 
test statistic is simple to compute and its asymp¬ 
totic null distribution is Gaussian. It allows to test 
whether a particular nested subgraph is “correct” — 
in the sense that it contains the true graph — and 
thus to determine if restricting the set of edges poses a 
real constraint. Matsuda used an iterative procedure: 
at each step the null hypothesis that a subgraph with 
one edge less is correct is tested. At each such itera¬ 
tion the test therefore has to be carried out as many 
times as there are edges remaining in the graph; this 
is computationally very costly because of the number 
of test statistics needing to be computed, especially 
for large p. Moreover, for general non-decomposable 
graphs the computation of the test statistic employs 
another iterative procedure to satisfy the constraints 
imposed by the currently selected graph. 

In this paper we introduce a much more efficient 
approach to identifying the model — while still based 
on Matsuda’s test statistic. Instead of an iterative 
procedure, we consider only tests that compare the 
fully connected or saturated graph (alternative) with 
graphs that have exactly one missing edge (null hy¬ 
pothesis). These tests are carried out using the well- 
known Holm method for multiple hypothesis testing. 
The method provides strong familywise error control 
which means that the type I error of rejecting any 
of the tested null hypotheses falsely does not exceed 
the specified significance level. This obviously de- 
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creases the number of tests required as well as the 
computational burden for evaluating the test statis¬ 
tics themselves as iterative fitting algorithms are no 
longer required. Indeed, the number of computations 
for our approach is 0(p^) compared to 0{p^) for Mat- 
suda’s implementation. Additionally, in simulations 
our algorithm achieves power at least as good as that 
achieved by Matsuda’s original and much slower ap¬ 
proach. 

In Section [5] we review background ideas in time 
series graphical modelling (including the concept of 
a correct graph). Section |3] summarizes the construc¬ 
tion of Matsuda’s test statistic and gives a worked 
example showing how it is used in his backward step¬ 
wise selection procedure. In Section 2] we describe 
our much more computationally efficient multiple hy¬ 
pothesis test (MHT) employing Matsuda’s test statis¬ 
tic. The computational efficiencies of the two ap¬ 
proaches are contrasted in Section [SJ justifying the 
OijP') improvement for the MHT algorithm, empiri¬ 
cally illustrated in Section I5 t 1 Statistical powers are 
compared for the two algorithms in Section 16.21 and 
the MHT algorithm is seen to do at least as well as 
Matsuda’s algorithm. That the MHT algorithm per¬ 
forms well for higher-dimensional models (large p), 
and is readily parallelizable for even greater speed 
gains, is shown in Section^ The methodology is sat¬ 
isfactorily applied to p = 10 EEG data in Section [H 
Concluding comments are provided in Section |9l 


2 Graphs and VAR Models 


to obtain the jth residual series defined as = 
^j,t-J2vG{\jk} ajy^uXy^t-u, where the p-2 filters 
{ajv,u,u € Z} give the minimum mean square predic¬ 
tion error. The fcth residual series is defined likewise. 
The sequence = cov{z/j^+.r, is 

called the partial cross-covariance sequence and the 
two residual series are uncorrelated if it is everywhere 
zero. If Xj and Xk are partially uncorrelated we write 
Let (j, fc) 

Then G is called a partial correlation graph. Eor 
Gaussian time series a null partial correlation equates 
to independence between the jth and fcth conditioned 
series, and in this case we have a conditional indepen¬ 
dence graph. 

The Fourier transform of the partial cross¬ 
covariance sequence is the partial cross-spectral den¬ 
sity function, denoted Sjk.{\jk){f)- The partial co¬ 
herence, for —1/2 < / < 1/2, is defined as 


lik.{\jk) if) 


\Sjk.i\jk)if)\^ 

{f)Skk.{\jk) if) 


Since Sjk.(\jk)if) = 0 for all - 1/2 < / < 1/2 
SvjVk,T = 0 for all r £ Z we see that 


(j, k)^E^ Sjk.{\jk)i-) = 0^ l]k.{\jk)i-) = 0- 

Let S{f) denote the spectral matrix of {Xt} at 
frequency /, assumed to exist and be of full rank. 
Denoting the (j, fc)th element of S~^ by the 

partial coherence can be expressed as, (e.g., M), 
"f%.{\M^f) = \S^Hf)\V[S^^if)S'^Hf)], and there¬ 
fore 


Throughout the paper, for a matrix A, Aj}^ refers to 
the (j, fc)th element of A and refers to the (j, fc)th 
element of A~^, unless otherwise stated. Without 
loss of generality {Xt} is taken to have a mean of 
zero. 

2.1 Time Series Graphical Models 


(j, k)iE ^ S^\f) =0, -1/2 </ < 1/2. 

i.e., if Xj and Xk are partially uncorrelated then 
there is a zero in the corresponding entry of the in¬ 
verse spectral matrix [3]. (Partial correlation graph¬ 
ical models for time series are undirected as (?, fc) ^ 
E ^ (fc,jj i E. ) 


The edges between the vertices represent partial cor¬ 
relation between two series, i.e., there is no connec¬ 
tion between nodes j & fc if and only if Xj and X^ 
are partially uncorrelated given X^\ji^y To be pre¬ 
cise, we remove the linear effects of X^^j^y from Xj 


2.2 Correct Graphs 

An important concept in what follows is that of a 
correct graph. Such graphs can be used to identify 
the underlying graphical model for multivariate time 
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series. The following definition is a slightly clarified 
version of that in Ha- 

Definition 1 If [V, E) is the true graphical model 
for {Xt}, then {V,E') is correct for (V,E), if 

S^Hf) = 0, {j,k) i E' and-1/2 < / < 1/2. (1) 

Note that by this definition, if an edge is missing 
in E' it must also be absent in E for (1/ E') to be 
correct. A correct graph iV^E ), when imposed on 
top of {V,E), will completely cover all its edges as 
E C E . Also, the fully saturated graph — contain¬ 
ing all edges between vertices — is correct for any 
graphical model. 

By way of an example, let G = {V, E) in Fig. [T] 
be the true graphical model. Then the fully sat¬ 
urated graph Go = {V,Eo) completely covers G 
and is correct for G. Likewise, Gi = {V,E ) com¬ 
pletely covers G and is correct for G. However, when 
G 2 = {V,E ) is imposed over G the edge between 
{A 2 ,t} and {X/^^t} in G is not covered. So (2,4) ^ E 
but (2,4) G E. Therefore E % E and G 2 is not a 
correct graph for G. 

It should be emphasized that we use the phrasing 
“(H, A) is the true graphical model for {Xt}” and 
reserve the use of the word correct for the special 
context of Definition [TJ 

2.3 VAR Models 

Here we give a very brief summary of some relevant 
results on VAR processes, useful for understanding 
ideas in our simulation examples such as “jointly in¬ 
fluencing.” We stress however that the methodology 
discussed in the paper is more widely applicable. 

{Xt} is a real-valued zero mean p-vector-valued 
autoregressive process of order i, or VARp(£), if it is 
of the form Xt = Yfu=i ^uXt-u+et, where the {#„} 
are pxp coefficient matrices, and = [ei_t,..., Cp^tV 
is a p-vector-valued white noise process with a mean 
vector of zero and covariance matrix If det{ Jp — 

Su=i ^ 0 for all \z\ < 1, where Ip is a p x p 

identity matrix, then the process is stationary [H 
p. 25]. We define $(/) = - ELo and 

4*0 = —Ip- 



Figure 1: Illustration of the concept of a correct 
graph. G is the true graph. As explained in the 
text, Go and Gi are correct for G while G 2 is not. 


Let be the (i,j)th element of where we 

are interested in the case i j. Then is said 
to be the influence from Xj^t-u on X^ t [3]- There is 
no influence from component j on i if ^ij^u = 0,u = 
so that ‘I’y/-) = 0. 

S{f) = $-i(/)S,[$-i(/)]^,-I/2 < / < 
1/2, is the spectral matrix for {Xt} where ^ 
denotes conjugate transpose. Then S~^{f) = 

$^(/) $(/), -1/2 < / < 1/2. If S, = allp it 

follows [3] that if the jth and fcth series do not jointly 
influence another series i ^ j,k (i.e., 4>y (-) = 0 
and/or <l>tfc(-) = 0), then the jth and fcth series will 
be partially uncorrelated if and only if 4>jfe(-) = 0 and 
‘ffe.(-)=0. 

Later we will make use of the VARs)!) model 

Xt = $iXt_i + et (2) 

where et ^ the 5-dimensional Gaussian 

distribution with mean 0 and covariance matrix 

For testing and illustration purposes we will make 
use of several models, named as follows: 
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Model A: Here = J 5 and 


0.2 

0 

- 0.1 

0 

-0.5 

0.4 

- 0.2 

0 

0.2 

0 

- 0.2 

0 

0.3 

0 

0.1 

0.3 

0.1 

0 

0.3 

0 

0 

0 

0 

0.5 

0.2 


By inspection of $ 1 , we see that the set of miss¬ 
ing edges is {(2,3), (2,5), (3,4)}. 


Model B: (Matsuda [IS])- Here Sg = /s and 


0.2 

0 

0.3 

0 

0.3' 

0.3 

- 0.2 

X 

0 

0 

0.2 

X 

0.3 

0 

0 

0.2 

0.3 

0 

0.3 

0 

0.2 

0 

0.2 

0.2 

0.2 


and we consider the cases a; = 0 and 0 . 1 , as used 
in [13]. For a; = 0 the set of missing edges in 
our model is {(2,3), (2,5)}, where we note that 
although entries (3,4) and (4,3) are both zero, 
neither entry (5, 3) nor (5,4) are zero so that se¬ 
ries 3 and 4 jointly influence the 5th, and there¬ 
fore edge (3,4) is not missing. When x = 0.1, 
the set of missing edges is simply {(2, 5)}. 


Model C: This consists of of the form 
with a; = 0 but now with ^ = 75 except that 
entries (1, 2) and (2,1) of are equal to 0.5. 
As a result of these two off-diagonal entries being 
non-zero, instead of missing edges {(2,3), (2, 5)} 
only (2,3) is missing. 


3 Test Statistic 

3.1 Test for Missing Edges 

Given {Xf} with graph {V,E) and spectral matrix 
S{f), consider graph {V,E') and matrix T{f) satis¬ 
fying 

Tjkif) = (j,fc) e A'; = 0, {j,k) i E'. 

(5) 

Unique existence of T{f ) is shown in [141 Lemma 7]. 


Proposition 1 \1‘A p. 4 OI] Given graph {V,E'), if 
T{ f) satisfies the constraints in (0) then (V, E') is 
correct for (U, E) if and only if S{f) = T{f). 

The result in Proposition |T] can be used to deter¬ 
mine whether graph (V,E 2 ) is correct, given (V,Ei) 
is correct, where E 2 C Ei. With {V,Ei) assumed 
correct we have Ti(/) = S{f). If we calculate esti¬ 
mators Ti{f), T 2 {f) using observed data, then in¬ 
tuitively a large difference between them suggests 
T 2 (/) ^ Ti{f) Ri S{f) and by Proposition jT] {V,E 2 ) 
would be deemed incorrect. 

Assuming {V,Ei) is correct, a test can be con¬ 
structed between a null {Hq) and alternative (Ha) 
hypothesis: 


Hq : (U, A 2 ) is correct vs El a ■ (U, A 2 ) is incorrect 


where a measure of divergence between Tj(/) and 
T 2 (/) is used to build the test statistic. 

For example, suppose we want to determine 
whether two series are partially uncorrelated, or in 
fact simply uncorrelated in this case. Define 


■ Snif) S,2{f) ■ 

. SUf) S22if) \ ’ 


( 6 ) 


the estimated spectral matrix. With (V, Ei) being 
the fully saturated model, with the two vertices con¬ 
nected, we can test against (U, E 2 ), the model where 
the vertices aren’t connected. The matrix satisfying 
([5]) for {V,E 2 ) is then 


T2{f) 


Siiif) 0 
0 S22U) _ ■ 


3.2 Spectral Estimator 


(7) 


Given vector observations Xq, ..., Xjv-i, the ma¬ 
trix periodogram estimator 5T)(/) of S{f) takes 
the form = W(/)W^(/), where W{f) = 

Xte“'^’’’7*/-v/X. ST)(/) has unit periodicity. 
Let fj = j/N, the jth Fourier frequency, then given 
a symmetric positive weight sequence {wk} for k = 
with ^ Wfe = 1, the frequency-averaged 
periodogram is 

M 

Sifj)= E (8) 

k^-M 
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This estimator was used by Matsuda m in the 
derivation of his test statistic. It is necessary and suf¬ 
ficient for S{f) to be non-singular that 2M -I- 1 > p, 

i.e., we have p or more non-zero values in our weight 
sequence, e.g., [HI P- 3007]. For consistency of the 
spectral estimator we require M, N ^ oo such that 
M/N —>■ 0; for the finite sample sizes used in practice 
we would expect M » p. M can be chosen using, 
for example, the method of ‘window closing ’ m or 
by cross-validation [TB] . 


3.3 Construction of Test Statistic 

Estimators Ti(/) and T^if) can be found by applying 
the constraints in JS]) to S{fj) in (|5]); the recursion of 
[22] is used for this purpose along with a result from 
PP] which justifies convergence — see [TPl p. 403]. 

To measure the difference between Ti (/) and T 2 U) 
Matsuda m used the estimated Kullback-Leibler di¬ 
vergence, eKL{Ti,T 2 ). With N assumed even this is 


1 

N 


N/2 

^[tr{fi(/,)f2-i(/,)} 


logdet{ri(/j)r2”^(/j)} -P 


Under the following assumptions, Matsuda derived a 
statistic based on eKL{Ti,T 2 ) which has an asymp¬ 
totically standard normal, A/’(0,1), statistic: 


as 


2MN 

1/2 

_Du{m2 - mi)_ 



eKL{Ti,T2) 


Cu{m 2 - mi) 
2M 


(9) 

where mi = #{(j, k) : (j, k) ^ Ei, j < k}, (the num¬ 
ber of missing edges in the model), and Cu,Du are 
constants with values determined by u(-), see [Si- 
Given assumptions 1-4 it follows that [TP] 


• Under Hq, 


ZnITi, T 2 ) as N ^ 00 (10) 


• Under Ha,Zm{Ti,T 2 ) takes the form 
1/2 

KL{S,T2)EOp{[MNYl^) 

( 11 ) 

where S is the true spectral matrix, KL{-, •) de¬ 
notes the true Kullback-Leibler divergence, and 
Op{[MNY/'^) denotes a term of smaller order in 
probability than 

Under Ha, the dominant term of the test statistic, 
the divergence, is positive and it therefore has a one¬ 
sided critical region. So for values of the statistic 
greater than a critical level, Hq is rejected in favour of 
Ha ■ Also from (HU the statistic diverges to infinity at 
rate [MiV]^/^ under Ha, so that the test can be more 
powerful than other standard tests which diverge at 
the rate [l3] . 


2MN 

Du{m 2 - mi) 


1- is a p-vector-valued Gaussian stationary 

process. 

2. S{f) is positive definite for j/j < 1/2. 

3. Sjkif) is twice continuously differentiable for 
j,k = l,...,p and -1/2 < / < 1/2. 

4. M = 0{N^) [M is at most of order N^) for 
1/2 < /3 < 3/4 and the weight sequence {wfc} 
is of the form Wk = u (^) , k = —M,..., M, 
where m(-) is a continuous even function on 
[- 1 / 2 , 1 / 2 ]. 

Matsuda m defined the test statistic Zn{Ti,T 2 ) 


Remark 1 We draw attention to the fact that Mat- 
suda’s statistical results assume that the processes in¬ 
volved are Gaussian. He considered /1.71 p. 407] that 
this might not be a necessity, but presently this is an 
open question. Bach and Jordan m also assumed 
Gaussianity in their study for directed graphs. 

3.4 Matsuda’s Algortihm 

Matsuda [13] used the test statistic ([9]) and the recur¬ 
sion in |22j in a backward stepwise selection algorithm 
to identify the best graphical model for {Xj}. Start 
by setting {V,Eo) equal to the fully saturated graph 
with no missing edges and choose significance level 
a. Set fc = 0 and begin: 
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1. Let (F, El^,), {V,El^,), E^^,) be the 

distinct graphs with one more missing edge than 
{V,Ek). Calculate the test statistics 

— ^N(Tk,TJ:_^_i), i = 1 ,..., Lk, 
with the statistic corresponding to model 

2. With $(•) denoting the standard Gaussian dis¬ 
tribution function, find Ck{oi) satisfying 

Cfc(a) (12) 


Edge 

0 

II 

fc = 1 

fc = 2 

fc = 3 

(1,2) 

53.71 

54.03 

57.02 

67.63 

(1,3) 

12.72 

14.62 

17.55 

17.54 

(1,4) 

22.25 

24.14 

24.14 

23.71 

(1,5) 

67.92 

68.96 

70.12 

79.62 

(2,3) 

0.54 

0.20 

— 

— 

(2,4) 

18.16 

17.82 

17.82 

22.41 

(2,5) 

1.89 

1.90 

1.94 

— 

(3,4) 

0.21 

— 

— 

— 

(3,5) 

5.86 

5.29 

5.50 

5.49 

(4,5) 

73.17 

72.60 

72.60 

77.23 

(7fc(0.05) 

2.53 

2.49 

2.44 

2.39 


and if for all > C'fe(a), then stop the pro¬ 

cedure and select {V, Ek) as the graphical model 
for {XJ. Otherwise, set (y,Ek+i) = {V,eI_^_^) 
where is the smallest statistic calculated. 

3. Set k = k + 1 and loop back to step 1. 

Under the assumption that all are standard 
Gaussian — which they will be asymptotically if 
-^fc+i) is a correct graph — the result 

r Lfc 'I ifc 

> l[P{Zl<Ck{a)} 

= (1 - a) 

means that under the hypothesis that all {V,El_^-^) 
are correct, the type I error rate is asymptotically 
less than a and the critical region is conservative [H 
p. 404], 

Remark 2 Perhaps a more intuitive definition for 
the type I error rate, which we use later, would be the 
probability of not removing an edge when {V, El_^_-^) is 
correct, i.e., it should have been removed. This is be¬ 
cause we know the distribution of Z\f when (U, 
is correct, so this error rate can be calculated. The 
error rate used in the stepwise selection is only rele¬ 
vant in terms of the tests carried out at each step. It 
is unclear how it is related to the overall properties of 
the procedure Q p. 158]. 


Table 1: Test statistics Z^iTk, T]^^) and critical lev¬ 
els C'fc(0.05) for Matsuda’s algorithm 

3.5 Worked Example 

The weight function chosen is Wk = cos(7rfc/2M), k = 
—M,... ,M with M = 64. Numerical evaluation of 
Cu and Du when u{x) = cos(7rx) gives Cu = 0.617 
and Du — 0.446. We consider Model A of Sec- 
tion l2.3l with missing edges {(2, 3), (2, 5), (3,4)}. With 
N = 1024 for simulations of the VAR process, we ran 
Matsuda’s algorithm with significance level a = 0.05. 

Let (V, Eq) be the completely saturated graph. 
The test statistics ^Ar(Tfc, for the potential 

models and the critical levels (^^(O.OS) at which they 
are tested are given in Table [T] The steps are inter¬ 
preted as follows: 

k = 0 : Not all test statistics are above the critical 
level, so the process does not stop; {V,Ei) is set 
to the graph with the edge {(3,4)} missing as 
this had the lowest corresponding test statistic. 

k = 1 : Likewise {V,E 2 ) is set to the graph with 
the edges {(2, 3), (3,4)} missing as (2, 3) had the 
lowest corresponding test statistic. 

k = 2 : Likewise {V,Efi) is set to the graph with the 
edges 1(2,3), (2, 5), (3,4)1 missing as (2,5) had 
the lowest corresponding test statistic. 

fc = 3 : At this step all the statistics are above 
( 73 ( 0 . 05 ); we stop the process here and take 
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{VjEs) as the estimated graph. 

This procedure gave the true final graph for the 
model. 

4 An Efficient Testing Proce¬ 
dure 

4.1 Multiple Hypothesis Testing 

We now introduce a new and much more efficient ap¬ 
proach for identifying the true graphical model for 
{Xt}. While still based on the test statistic defined 
in our method doesn’t update at each iteration. 
Essentially, we carry out Matsuda’s method only for 
k = 0, taking {V,Eo) as the fully saturated graph. 
If the value of the statistic Zn{To,TI) correspond¬ 
ing to graph {V,El) is below an appropriate critical 
level, it is deemed a correct graph and the missing 
edge i should also be missing in the estimated graph¬ 
ical model. We construct our estimated model by 
removing insignificant edges via a MHT. 

Our null hypotheses are of the form Hi : {V,El) 
is correct. The alternative hypothesis in each case is 
the fully connected or saturated graph. Each test is 
thus concerned with whether an edge exists between 
two vertices specified by the value of i. 

Proposition 2 If the graph (V,El) is correct for 
edges corresponding to i = ii,... ,is and incorrect for 
all others, then the graphical model (V, E) for {f^t} 
is the graph with only edges {ii ,..., is} missing. 

Proof: If graph {V, El) is correct and i corresponds 
to the edge {j,k), then by definition S^^{f) = 0 for 
“1/2 < / < 1/2 where S{f) is the spectral matrix of 
the true graphical model. This means that edge (j, k) 
must also be missing in (V, E) and this is the case for 
all i = ii,..., ig. Conversely, if {V,El) is incorrect, 
Si^lf) ^ 0 and (j, fc) must necessarily be in {V,E), 
hence the result. □ 

We can list the L = p{p — l)/2 hypotheses in an 


obvious way: 

Hi : (V, E\) is correct; (1, 2) ^ E 

Hp-i : {V,E^~^) is correct; (l,_p) ^ E 

Hp : {V, E^) is correct; (2, 3) ^ E 

Hl : (V, El) is correct; (p — l,p) ^ E. 

Multiple hypothesis testing may be addressed via the 
maximin stepdown procedure m Sec. 9.2]. With 
Zlf = Zn{To,TI) for i = 1,... ,L and ordered test 
statistics Z^^^ < • • • < the corresponding hy¬ 

potheses . •., -fl(L) can be tested using the max¬ 
imin stepdown procedure: 

• Step 1: if Z^^ < Cl, accept Hi,... ,Hl. 

• Step 2: if Z^'' > Cl but < Cl-i, reject 

i?(i) and accept i?(i),..., 

• Step 1: if Z^^^ > Cl, 

C^l-i+ 2 ), but 
reject -ff(L), ■ • ■,-ff(L-i+ 2 ) 

H{1), ■ ■ ■ , i?(L_i+i). 


• Step L+1: if Z^^ > Cl, ■ ■ ■, Z^'^ > Ci, reject 
Hi,...,Hl. 

Remark 3 For each of these tests To{f) = S{f) and 
TC^if) has only a single zero constraint so that find¬ 
ing it does not require the iterative scheme in \22f . 
Consequently, the test statistics may be assembled 
very easily and efficiently. 

4.2 Critical Levels 

The choice of the critical values Ci,... ,Cl is related 
to the idea of the family-wise error rate (EWER). If Y 


y(L — l + 2) ^ 

. . . , ^ 

< C^L-l+1) 

and accept 
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is the number of true null hypotheses that are falsely 
rejected, then the FWER is defined as P{Y > 1), i.e., 
the probability that at least one true null hypothesis 
will be falsely rejected. It is desired that FWER < a 
for all possible constellations of true and false hy¬ 
potheses, the so-called strong error control [11] (9.3)]. 
This can be achieved using the (conservative) Holm 
approach m p. 363]: at each level the critical value 
can be evaluated using Ci{a) = F~^ (l — , where 

F(-) denotes the common distribution function of the 
test statistic under the null hypothesis, which from 
(HOI is in fact $(•), the standard Gaussian distribu¬ 
tion function, in our case. So we choose our critical 
values according to the easily computed formula 

a(a) = $-1(1-1). (13) 

4.3 Worked Examples 


i 

Missing Edge 

vl*) 

a (0.05) 

10 

(4,5) 

73.17 

2.58 

9 

(1,5) 

67.92 

2.54 

8 

(1,2) 

53.71 

2.50 

7 

(1,4) 

22.25 

2.45 

6 

(2,4) 

18.16 

2.39 

5 

(1,3) 

12.72 

2.33 

4 

(3,5) 

5.86 

2.24 

3 

(2,5) 

1.89 

2.13 

2 

(2,3) 

0.54 

1.96 

1 

(3,4) 

0.21 

1.64 


(■i) 

Table 2: Ordered statistics and critical levels 
a(0.05) for MHT 

and is classified as missing, all other hypotheses are 
rejected. So again the true graphs were found. 


Using the same VAR5(1) observations as in Sec¬ 
tioning we list our L = 10 hypotheses: 


Efficiency Contrast 


Ffi : Edge doesn’t exist between (1,2) 
H 2 : Edge doesn’t exist between (1,3) 


Proposition 3 The number of test statistics calcu¬ 
lated in the Matsuda algorithm is 0{p^) and in the 
MHT is 0(p2). 


Hio : Edge doesn’t exist between (4, 5) 

Ordering the test statistics and including the crit¬ 
ical levels CdO.OS) of (IT3| gives Table [2] We can 
see that > (710(0.05),..., > (74(0.05) and 

< ( 73 ( 0 . 05 ), so we reject 17(io) • ■ ■H(a) and ac¬ 
cept 17(3) ... iJ(i). Note that this means our es¬ 
timated graphical model is the graph with edges 
{(2,5), (2,3), (3,4)} missing, the true graph for the 
model. 

We also compared behaviours of Model B of Sec- 
tion l2.3l using a; = 0, with Model C, the only paramet¬ 
ric difference being that ^ for Model C. The 
former has missing edges {(2,3), (2, 5)} the latter has 
only (2, 3) missing. Constructing a table like Table [2] 
for each we find for Model B that edges (2,3) and (2,5) 
have associated statistics 1.49 and -0.31 and are clas¬ 
sified as missing, all other hypotheses are rejected. 
For Model C edge (2,3) has associated statistics 1.07 


Proof: For Matsuda’s algorithm, assuming the final 
output is the true graphical model with k missing 
edges. 


pip - 1 ) 


p{p - 1 ) 


- 1 


P{P - 1 ) 


-k 


= (14) 


test statistics are calculated, where fc S {0,... ,p(p — 
l)/2}. Setting the ratio of non-edges to total possible 
edges to a, we can write k = for 0 < a < 1. 

Then substituting into (fHl) . the total number of test 
statistics needing to be calculated, n say, satisfies 


4 

n=p 



o(p^), 


where o{p'^) denotes terms of smaller order than p'^. 
For sparsity take 1/2 < a < 1, then asymptotically 
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in p, 
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3p P 


i.e., 0(p^). For the MHT, regardless of the number 
of missing edges in the model, we always calculate 
n = p{p—l)/2 statistics, so asymptotically, n ~ p^/2, 
i.e., 0 {p^). □ 


Clearly the sample size, N, and length of weight 
sequence, 2M + 1, will affect the time it takes to cal¬ 
culate each test statistic. Also, if there is only one 
missing edge in our model, as is the case in the MHT, 
we do not need to iterate in order to find the matrix 
satsfying the constraints in ([5]). If there is more than 
one missing edge, as in all steps of the Matsuda algo¬ 
rithm excluding the first, iteration is required as set 
out in [22] . As the number of iterations must increase 
as more edges are removed from the model for a good 
estimate, we will denote this number at each stage as 
Ik- = 1 in the MHT as we only have to iterate 
once). It can be shown by considering the steps in 
the construction process that computation time for 
each statistic is ~ 2NM -I- Np^lk- 

Combining this with the number of test statistics 
needed to be calculated above, Matsuda’s algorithm 
has a time Ti ^ 2NMp^ -|- Np^lk and for the MHT, 


T2 - 2NMp^ -t Np*. (15) 


So the calculation times T for the tests would be 
expected to be 


T = 


0{p^) 

0{p^) 


for Matsuda’s algorithm; 
for the MHT. 


(16) 




MN 6 

X 10 


Figure 2: Calculation timings in seconds: (a) Ti for 
Matsuda’s algorithm, to the one-sixth power, versus 
p, (b) T2 for the MHT, to the one-quarter power, 
versus p, (c) the ratio of computation times T 1 /T 2 
versus p, and (d) T 2 for the MHT versus MN. Here 
N = 1024, M = 32. 


6 Practical Comparison For 
Small Dimensions 

For small values of p we are able to make direct prac¬ 
tical comparisons of the two algorithms as Matsuda’s 
can still be calculated in a reasonable time period. 

6.1 Timings 

Fig. [5] compares calculation times T in seconds, for 
the tests for N = 1024, M = 32. Fig. |2la) plots 


versus p for Matsuda’s algorithm, while Fig. E^b) 
plots versus p for the MHT. In both plots these 
times increase linearly with p as expected. Fig. He) 
shows the ratio T1/T2, illustrating the rapid increase 
in computation time for Matsuda’s algorithm with p, 
compared to the MHT approach. These results were 
derived by randomly generating a VARp(l) model 
matrix $1 (see the Appendix) for eachp value consid¬ 
ered, and then recording the completion time of each 
algorithm — Matsuda’s or MHT — for that model. 

Fig. Hd) shows, for p = 5 fixed and the MHT, 
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a plot of T 2 versus MN, where M = N/32 and N 
increases from 200 to 9 000. From (flSl) 

NMp-^ dTa ^ 2 32p4 


T 2 - 2NMp^ + 


M d{NM) 




N 


which for large N means that T 2 should have a con¬ 
stant gradient with MN, as seen in Fig.lljd). These 
results were derived by randomly generating a sin¬ 
gle VAR5(1) model matrix €>i, (Appendix), and then 
recording the completion time for the MHT algorithm 
for that model using the different M, N combinations 
specified. 


Edge 

Average 

Standard Error 

(1,2) 

26.93 

4.57 

(1,3) 

37.94 

5.25 

(1,4) 

12.55 

3.10 

(1,5) 

41.39 

5.63 

(2,3) 

0.25 

1.08 

(2,4) 

33.21 

5.03 

(2,5) 

0.34 

1.05 

(3,4) 

1.00 

1.21 

(3,5) 

13.40 

3.39 

(4,5) 

15.39 

3.68 


6.2 Power 

We will compare the results of the MHT approach 
against Matsuda’s algorithm using two different mod¬ 
els. To do this we utilise the concepts of (i) FWER, 
dehned in Section EM and (ii) effective power, the 
probability of rejecting all false hypotheses m 

The first model is the VARs)!) model of ([4|) 
and we consider the cases a; = 0 (missing 

edges {(2, 3), (2, 5)}), and 0.1, (single missing edge 
{(2,5)}), as used in [13]. 

We considered combinations {N,M) of (512,16), 
(1024, 32), (2048,64). Results are based on 600 repli¬ 
cations for each {N,M) pair. 

For a; = 0 to compare the algorithms, we only con¬ 
sider the edges (2, 3), (2, 5), (3,4). This is due to the 
fact that these produce the three borderline statis¬ 
tics and while others may sometime fall outside the 
critical region — i.e., we reject them as edges — this 
is infrequent enough that simply for comparison pur¬ 
poses it is worth saving time by ignoring these. This 
approach is supported by the results in Table|3|which 
used the values N = 2048 and M = 64. (In the 
computations the test statistics for other edges were 
essentially taken to be infinity.) 

The results displayed in Fig. [3| were constructed 
as follows. For the multiple hypothesis test, a was 
varied between 0 and 0.5 in steps of 0.00125, and 
used as in m- The MHT was carried out for each 
of the 600 replications followed by the two steps: 

1. the FWER was recorded as the proportion of 
the replications for which at least one true null 
hypothesis was falsely rejected; 


Table 3: Average and standard error of values of the 
Model B (a; = 0) test statistic for each edge test 
with N = 2048, M = 64. 


X 

missin 

(2,3) 

g edge 
(2,5) 

lypothesis 

(3,4) 

0 

True 

True 

False 

0.1 

False 

True 

False 


Table 4: State of the missing edge hypotheses for 
Model B when x = 0 and x = 0.1. 

2 . the effective power of the test was recorded as the 
proportion of replications for which (3,4) was 
not included as a missing edge. This is essen¬ 
tially the power of the sub-test on the hypotheses 
claiming edges (2, 3), (2, 5), (3,4) to be missing, 
since of these the only hypothesis that is false is 
the (3,4) one; see Table 01 

For Matsuda’s algorithm a parameter /3 was cre¬ 
ated and varied between 0 and 0.5 in steps of 0.00125, 
and then a formed from a = this a is the quan¬ 
tity used in dnj. This approach allowed us to con¬ 
centrate more a values near zero, resulting in a more 
even grid for the resultant FWER. Matsuda’s algo¬ 
rithm was carried out for each of the 600 replications 
and the FWER and effective power recorded. 

Figs. EKa), (c) and (e) show the relationship be¬ 
tween the FWER and effective power for the MHT 
(solid line) and Matsuda’s algorithm (dashed line). 
As can be seen, there is no significant difference in 
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Figure 3: FWER versus effective power for the MHT 
(solid lines) and Matsuda’s algorithm (dashed line) 
for Model B, (|4|), with (a) N = 512, M = 16,2: = 0 
(b) N = 512, M = 16,x = 0.1, (c) N = 1024, M = 
32,x = 0 and (d) N = 1024, M = 32,x = 0.1, (e) 
N = 2048, M = 64,a; = 0 and (f) N = 2048, M = 
64, X = 0.1. 


the power of the test for the two methods. 

For the case x = 0.1 we see from Table |4] that 
the hypotheses stating (2,3) and (3,4) to be missing 
edges are both false. So the same basic procedure is 
carried out as for a; = 0 but now the effective power 
is computed as the probability of rejecting both the 
hypotheses involving (2,3) and (3,4). The results are 
shown in Figs. [31(b), (d) and (f) from which it is seen 
that again the MHT does at least as well as Matsuda’s 
algorithm. 

Turning to model A of Section [2Al with given 
in m and missing edges {(2, 3), (2, 5), (3,4)}, we can 
see in Table |S] that the only other ‘boundary edge’ is 
(3,5). 

Again we considered combinations {N,M) of 
(512,16), (1024,32), (2048,64) and used 600 repli¬ 
cations for each {N, M) pair. The results were cal- 


Q) 

g 


O 

CL 

LU 


Q) 

g 

O 

CL 

LU 




0 0.2 0.4 

EWER 


Figure 4: FWER versus effective power for the MHT 
(solid lines) and Matsuda’s algorithm (dashed line) 
for Model A, ([3|), and (a) N = 512, M = 16 (b) 
N = 1024, M = 32 and (c) N = 2048, M = 64. 


Edge 

Average 

Standard Error 

(1,2) 

50.00 

5.93 

(1,3) 

15.74 

3.52 

(1,4) 

22.02 

4.34 

(1,5) 

64.12 

6.79 

(2,3) 

0.29 

1.06 

(2,4) 

15.66 

3.38 

(2,5) 

0.27 

1.06 

(3,4) 

0.32 

1.05 

(3,5) 

3.86 

1.95 

(4,5) 

66.09 

6.61 


Table 5: Average and standard error of values of the 
model 2 test statistic for each edge test with N = 


2048 and M = 64. 


culated using the same method as above, the only 
difference being the effective power is now the power 
of the sub-test on hypotheses claiming the edges 
(2, 3), (2, 5), (3,4), (3, 5) to be missing. Of these, the 
false hypothesis is that stating (3, 5) to be a missing 
edge. 

Fig. m compares the FWER and effective power for 
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Figure 5: Calculation timings in seconds for the MHT 
algorithm as p varies from 10 to 50. Here N = 2048 
and M = 128. 



p = 10 : 29 

II 

CO 

o 

p = 30 : 50 

Type I 

2.2 

3.0 

4.1 

Type H 

1.3 

2.4 

2.9 


Table 6: Average type 1 and 11 percentage errors 

the MHT and Matsuda’s algorithm. Again, there is 
no significant difference in the power of the test for 
the two methods. 

7 MHT Algorithm For Higher 
Dimensions 

We have shown that the MHT approach performs 
well for a relatively small number of dimensions p. 
We now look at higher dimensions. 

7.1 Timings 

It might be thought that the inefficiency of Matsuda’s 
algorithm is not of concern for such moderately large 
p, given modern computing power. However Fig. [S] 
gives timings (see Section lSTl) for the MHT algorithm 
in seconds for p from 10 to 50 (using a 3GHz proces¬ 
sor). Here N = 2048 and M = 128. For p = 50 
the time taken was about 220s; if this is scaled up 
(crudely) for Matsuda’s algorithm by = 2500 we 
arrive at a time of over 6 days. 


7.2 Accuracy 

Table [5] reports the average type I and type H per¬ 
centage errors encountered in the model estimation 
when a = 0.05. Here averaging is (i) over the 20 esti¬ 
mated models for p = 10 : 29 (first column), (ii) over 
100 repeat simulations for the single case p = 30 (sec¬ 
ond column), and (hi) over the 21 estimated models 
for p = 30 : 50 (third column). The type I percentage 
error is here the ratio 100(number of edges accepted 
when missing)/(number missing) and the type H per¬ 
centage error is the ratio 100(number of edges deleted 
when present in the true graph)/(number present). 

Fig. El gives the type I and H percentage errors 
when p is fixed at the large value p = 150 and a is 
varied. These results were derived using a VARp(l) 
model matrix (see the Appendix) giving rise 
to a true graphical model with 36% of connections 
present. The results seem quite satisfactory and be¬ 
have in the reciprocal way expected. 

7.3 Parallelizability 

In contrast to Matsuda’s implementation there is no 
dependency between the calculation of each of the 
test statistics. On a multicore CPU a test statistic 
can be assigned to each core, and upon completion 
the next statistic needing calculation is assigned. Due 
to the small overheads this introduces, compute-time 
minus a (near) constant factor for calculating the 
frequency-averaged periodogram is simply inversely 
proportional to the number of cores used. The fact 
that for large p most time is spent in calculating the 
test statistics means that our algorithm can be ef¬ 
fectively scaled for higher dimensionality just by us¬ 
ing more processor cores. The example of Fig. [3 il¬ 
lustrates the inverse proportionality, using an 8 core 
processor. 

8 Application to EEG Data 

We now apply the MHT method to electroencephalo¬ 
gram (EEG) data, (resting conditions with eyes 
closed), for 33 males, 19 diagnosed with negative- 
syndrome schizophrenia, and 24 controls. This rare 
heritage clinical dataset from unmedicated patients 
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Figure 6: Type I and II percentage errors for p = 150 
as a is varied. Here IV = 2048, M = 512. 



1/(number of cores) 

Figure 7: Calculation timings in seconds for the MHT 
algorithm for p = 60, iV = 2048, M = 512 against the 
reciprocal number of cores, as the number of cores 
varies from 1 (right of plot) to 8 (left). 

was discussed in detail in m- Interest is in detect¬ 
ing any differences in patterns of brain connectivity 
between the groups. 

For each individual EEC was recorded on the scalp 
at 10 sites so that {X*} is a p = 10 vector-valued 
process. There are = 2“^® possible graph 

structures, and p(p — l)/2 = 45 possible connections 
between the series (edges to the graph). Each possi¬ 
ble connection was assigned a connection index from 
1 to 45 as given in m- 

For illustration purposes, the ten channel time 
series for one of the negative-syndrome patients is 
shown in Fig. [H] For each of the negative-syndrome 
patients the MHT algorithm was used to determine 
whether an index-z connection existed, and the per¬ 
centage of the group of patients exhibiting this con- 



0 10 20 30 40 50 

time (s) 


Figure 8: Ten channel EEG time series for one of the 
negative-syndrome patients. 



Figure 9: Percentage of negative-syndrome patients 
(heavy line) and controls (thin line) exhibiting a spec¬ 
ified connection. {N = 1024, M = 20, a = 0.01). 

nection was recorded. The same was done for the 
control group. Fig. [Ogives the resulting percentages 
for each connection and both groups. For 3/4 of the 
connections the percentage is lower for the controls, 
suggesting patients exhibit a tendency towards higher 
connectivity, a result consistent with m where com¬ 
pletely different methodology was used. 

9 Concluding Discussion 

Matsuda’s approach to identification of a graphical 
model involves an appealing Kullback-Leibler statis¬ 
tic but, while improving on exhaustive search ap- 
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proaches, his implementation using a backward step¬ 
wise selection is extremely heavy computationally. 
This paper introduced a multiple hypothesis test im¬ 
plementation using Matsuda’s statistic. The number 
of statistics needing to be calculated is reduced by 
0 {p^) and the computational burden for evaluating 
the test statistics themselves is notably reduced as 
iterative fitting algorithms are no longer required. 

The MHT approach allows us to derive a more rel¬ 
evant control on the error rate in contrast to the step¬ 
wise procedure where the error rate used in each test 
step doesn’t have a clear link to the total error of the 
procedure. The type I error rate we are controlling 
is the probability of failing to delete an edge when 
it is missing in the true graphical model. It may be 
more intuitive to define the error as deleting an edge 
that is contained in the true graph. In order to do 
this we would have to accurately know the distribu¬ 
tion of the test statistic under this alternative, which 
unfortunately we don’t know this. 

The conservative nature of the Holm approach can 
in theory be somewhat offset by using an adaptive ap¬ 
proach, (explained in detail by Guo [9]), particularly 
for large p. The result is a more powerful test than the 
standard Holm procedure and although the FWER 
will be higher, Guo showed it still controls the FWER 
asymptotically. We implemented this methodology 
but for our examples and the values of p utilised, dif¬ 
ferences were very small; however, this approach is 
undoubtedly worthy of further investigation. 

It is possible using our method to conduct an ef¬ 
ficient stepwise approach by running the MHT and 
keeping all edges that clearly exist (i.e., have a very 
large test statistic), thus defining a new To to that 
used previously. Much of the work is thus completed. 
Then the MHT can be re-run to test models differ¬ 
ing from To by one edge, but such additional steps 
require the iterative scheme |22) . 

Finally, we have shown that the algorithm scales 
very well — is highly parallelizable — with appro¬ 
priate computing resources. Future work would in¬ 
volve rendering the algorithm for efficient calculation 
on high performance computing hardware such as 
GPUs. 


Appendix: Random Model Construc¬ 
tion 

For our simulations random VARp(I) models were 
constructed by randomly formulating p x p matrices 
$1 with the number of zero entries specified as fol¬ 
lows. 

For a given p value e. p x p matrix was con¬ 
structed with null entries. All diagonal elements 
and non-diagonal elements in position {i,j) for which 
(* + j)mod fc = I were populated by random values 
sampled from the Af{0, 1) distribution. The matrix 
was then subject to spectral decomposition and any 
eigenvalues with modulus greater than unity were re¬ 
placed by their reciprocals and reconstructed us¬ 
ing the modified eigenvalues. For such a we know 
det{/p — $iz} ^ 0 for all \z\ < 1, [TH pp. 15 & 653] 
and so a stationary process results. The choice of k 
controls the sparsity; our default choice fc = 5 makes 
approximately 64% of the matrix entries zero for 
p=10: 50. 
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